Goto

Collaborating Authors

 invariant distribution



1b9a80606d74d3da6db2f1274557e644-Paper.pdf

Neural Information Processing Systems

Theirdifference,albeitobvious,shouldalsobeemphasized: GD isdeterministic, and the same constant initial condition will always lead to the same iterates.


Thompson sampling: Precise arm-pull dynamics and adaptive inference

Han, Qiyang

arXiv.org Machine Learning

Adaptive sampling schemes are well known to create complex dependence that may invalidate conventional inference methods. A recent line of work shows that this need not be the case for UCB-type algorithms in multi-armed bandits. A central emerging theme is a `stability' property with asymptotically deterministic arm-pull counts in these algorithms, making inference as easy as in the i.i.d. setting. In this paper, we study the precise arm-pull dynamics in another canonical class of Thompson-sampling type algorithms. We show that the phenomenology is qualitatively different: the arm-pull count is asymptotically deterministic if and only if the arm is suboptimal or is the unique optimal arm; otherwise it converges in distribution to the unique invariant law of an SDE. This dichotomy uncovers a unifying principle behind many existing (in)stability results: an arm is stable if and only if its interaction with statistical noise is asymptotically negligible. As an application, we show that normalized arm means obey the same dichotomy, with Gaussian limits for stable arms and a semi-universal, non-Gaussian limit for unstable arms. This not only enables the construction of confidence intervals for the unknown mean rewards despite non-normality, but also reveals the potential of developing tractable inference procedures beyond the stable regime. The proofs rely on two new approaches. For suboptimal arms, we develop an `inverse process' approach that characterizes the inverse of the arm-pull count process via a Stieltjes integral. For optimal arms, we adopt a reparametrization of the arm-pull and noise processes that reduces the singularity in the natural SDE to proving the uniqueness of the invariant law of another SDE. We prove the latter by a set of analytic tools, including the parabolic Hörmander condition and the Stroock-Varadhan support theorem.


Consistent Projection of Langevin Dynamics: Preserving Thermodynamics and Kinetics in Coarse-Grained Models

Nateghi, Vahid, Neureither, Lara, Moqvist, Selma, Hartmann, Carsten, Olsson, Simon, Nüske, Feliks

arXiv.org Artificial Intelligence

Coarse graining (CG) is an important task for efficient modeling and simulation of complex multi-scale systems, such as the conformational dynamics of biomolecules. This work presents a projection-based coarse-graining formalism for general underdamped Langevin dynamics. Following the Zwanzig projection approach, we derive a closed-form expression for the coarse grained dynamics. In addition, we show how the generator Extended Dynamic Mode Decomposition (gEDMD) method, which was developed in the context of Koopman operator methods, can be used to model the CG dynamics and evaluate its kinetic properties, such as transition timescales. Finally, we combine our approach with thermodynamic interpolation (TI), a generative approach to transform samples between thermodynamic conditions, to extend the scope of the approach across thermodynamic states without repeated numerical simulations. Using a two-dimensional model system, we demonstrate that the proposed method allows to accurately capture the thermodynamic and kinetic properties of the full-space model.


Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution

Espana, Tomas, Hafsi, Yadh, Lillo, Fabrizio, Vittori, Edoardo

arXiv.org Artificial Intelligence

We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.






Data-driven approximation of transfer operators for mean-field stochastic differential equations

Ioannou, Eirini, Klus, Stefan, Reis, Gonçalo dos

arXiv.org Machine Learning

Mean-field stochastic differential equations, also called McKean--Vlasov equations, are the limiting equations of interacting particle systems with fully symmetric interaction potential. Such systems play an important role in a variety of fields ranging from biology and physics to sociology and economics. Global information about the behavior of complex dynamical systems can be obtained by analyzing the eigenvalues and eigenfunctions of associated transfer operators such as the Perron--Frobenius operator and the Koopman operator. In this paper, we extend transfer operator theory to McKean--Vlasov equations and show how extended dynamic mode decomposition and the Galerkin projection methodology can be used to compute finite-dimensional approximations of these operators, which allows us to compute spectral properties and thus to identify slowly evolving spatiotemporal patterns or to detect metastable sets. The results will be illustrated with the aid of several guiding examples and benchmark problems including the Cormier model, the Kuramoto model, and a three-dimensional generalization of the Kuramoto model.